Techniques to improve coding agent velocity
April 3, 2026
This post is a grab bag of techniques to improve engineering velocity that I haven't seen discussed much online. Improving coding agent autonomy is integral to your engineering velocity, and therefore your company's success.1
Share transcripts across your team
Context about why certain design decisions were made is often available from past session transcripts. As your team scales, most of this context is not readily accessible on your machine. We built an internal tool that uploads past transcripts to a shared MCP server and gave our coding agents a tool to query historical conversations.
Here's an example of this in practice: Sam and I were pairing on a memory leak bug, and I asked Claude Code to read his agent's transcripts to get up to speed.
We evaluated the effectiveness of this technique informally by asking Claude Code to fix the same (real) bug several times, and comparing trajectories of agents that have access to transcript sharing against agents that don't. The shared transcripts comprised three previous Claude Code sessions where we had unsuccessfully attempted to fix the bug.
We found that for Claude Code instances without access to transcript sharing, 48.6% of work was wasted re-investigating application-level fixes that earlier sessions had already tried and ruled out.
Claude Code arrives at the fix faster with transcript sharing
| Tool calls |
272
~137
|
|
| Agent turns |
123
~71
|
|
| Wasted actions |
192
~5
|
Without shared transcripts With shared transcripts
Remove blockers to autonomous action
Here are good reasons for a coding agent to stop looping:
- It's done with the task
- It needs clarification on the scope of the task
Here are bad reasons for stopping:
- It needs the user to unblock it
A general principle that I've found to be helpful is "if the agent asks me for something that it could theoretically know the answer to, make sure it has access to it next time". Here is a summary of common "preventable" reasons that a coding agent stops prematurely and how we've addressed them:
| Reason for stopping | Fix |
|---|---|
| Finishes designing UI and asks me how it looks | Playwright mcp / chrome devtools mcp so that it can test UI autonomously |
| Needs access to production resources | Render / neon mcp |
| Needs to test a "real-world" capability | Give it access to the real world (eg give it an email) |
| Needs permission to do something dangerous (eg reading env vars) | Use auto mode, prompt it to never ask for permission |
Here are some examples of these fixes in practice:
Autonomous UI testing with Playwright MCP — instead of asking "how does this look?", the agent takes screenshots and iterates
Production access via Render MCP — instead of asking me to check prod, the agent pulls live metrics and logs directly
Real-world testing via Gmail API — instead of asking me to send a test email, the agent does it itself
Auto mode for dangerous operations — instead of asking permission to read env vars, the agent just checks
Across 738 transcripts, 11.4% ended with a preventable stop. 30% of stops are directly addressable by adding tools.
Coding agents stop prematurely because they need permission and tools
| Needs permission | 45 stops (54%) | |
| UI feedback | 15 (18%) | |
| Prod access | 9 (11%) | |
| Other | 9 (11%) | |
| Real-world test | 4 (5%) | |
| Scope clarification | 2 (2%) |
Automatically auditing prod agent transcripts
Most teams evaluate prod agent outputs. Evaluating session transcripts in addition to agent outputs is useful, since coding agents can arrive at a correct output despite legitimate blockers. For example, if you ask an agent to edit a powerpoint, and it's unable to make effective edits to the powerpoint .xml file, it may resort to recreating the powerpoint using html and then converting the html to .pptx.
To facilitate this kind of evaluation, we persist production transcripts and run coding agents that automatically audit every transcript generated by our product and check to see if the agent got blocked or confused. Then it makes a list for us to audit.3
Here's an example of this technique in practice:
- As a friend of mine once said, "we're all just playing factorio now". ↩
- "Tools" including resources etc. ↩
- You might wonder why not to close the loop completely and automatically improve the product in this way. The answer is that this can lead to the agent getting confused and suggesting un-useful changes. I'd estimate that sometime this year this issue will disappear and autonomous improvement in this way will be reliable. ↩